Goto

Collaborating Authors

 margin ranking


Supplementary Information for: Identifying Mislabeled Data using the Area Under the Margin Ranking

Neural Information Processing Systems

All tables report the mean and standard deviation from 4 trials with different random seeds. The learning rate is dropped by a factor of 10 at epochs 150 and 225. All other hyperparameters are consistent with the original training scheme. We train ResNet-50 models on this dataset from scratch. Because this dataset is smaller than ImageNet, we train the models for 180 epochs.


Review for NeurIPS paper: Identifying Mislabeled Data using the Area Under the Margin Ranking

Neural Information Processing Systems

The authors use the margin ranking to find the data with noisy labels. As far as know, the concept of margin has long been used for classification task, e.g., face recognition [1], [2], semi-supervised learning [3]. The methods [4], which employ memorization effect to select confident samples (small loss samples), also share similar ideas. The data with small loss has clean labels with high confidence and also has a larger margin ranking. The authors ignore the discussion and comparison about these existing works.


Review for NeurIPS paper: Identifying Mislabeled Data using the Area Under the Margin Ranking

Neural Information Processing Systems

This paper proposes a simple strategy for identifying training samples that may be mislabeled. The paper received mixed reviews from the reviewers, with one reviewer in particular strongly arguing that simple and effective methods are deserving of publication, while other reviewers were concerned that the approach is too straightforward. Weighing these arguments, it was felt in discussion that this paper could be of interest as a straightforward approach to deal with the important problem of label noise.


Identifying Mislabeled Data using the Area Under the Margin Ranking

Neural Information Processing Systems

Not all data in a typical training set help with generalization; some samples can be overly ambiguous or outrightly mislabeled. This paper introduces a new method to identify such samples and mitigate their impact when training neural networks. At the heart of our algorithm is the Area Under the Margin (AUM) statistic, which exploits differences in the training dynamics of clean and mislabeled samples. A simple procedure - adding an extra class populated with purposefully mislabeled threshold samples - learns a AUM upper bound that isolates mislabeled data. This approach consistently improves upon prior work on synthetic and real-world datasets.